-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add dplyr tutorial port, clean up docs so documenter is happy #279
Conversation
docs/src/dplyr.md
Outdated
|
||
## What is DataFramesMeta.jl? | ||
|
||
DataFramesMeta.jl is a Julia package to transform and summarize tabular data. It provides a more convenient syntax to work with DataFrames from [DataFrames.jl](https://github.com/JuliaData/DataFrames.jl). For a deeper explanation of DataFramesMeta.jl, see the [documentation](https://github.com/JuliaData/DataFramesMeta.jl). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe add that this is a DSL. The syntax is more convenient at the cost of syntax not being valid Julia code.
On the other hand DataFramesMeta.jl concepts try to mirror DataFrames.jl concepts (which is important I think for learning and using both)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be clearer now.
docs/src/dplyr.md
Outdated
|
||
Like dplyr, the DataFramesMeta.jl package contains a set of macros (or "verbs") that perform common data manipulation operations such as filtering for rows, selecting specific columns, re-ordering rows, adding new columns and summarizing data. | ||
|
||
In addition, DataFramesMeta.jl contains a useful operation `@combine` to perform another common task which is the "split-apply-combine" concept. We will discuss that in a little bit. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this sentence is not clear to me and seems more detailed than the previous. Especially as in the previous you have written "summarizing data".
Also - if you keep this maybe give a link to "split-apply-combine" so people reading it know what we mean (not all of them might know it)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hopefully it is clearer now.
docs/src/dplyr.md
Outdated
|
||
# Important DataFramesMeta.jl Verbs To Remember | ||
|
||
dplyr verbs | Description |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do you call them dplyr verbs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The base tutorial this came from uses the term "verb". I think the author likes the term because it sounds less technical than "function".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am OK with verb
, I am not clear why you use term "dplyr" - it seems these DataFramesMeta.jl verbs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh that was a typo, sorry.
docs/src/dplyr.md
Outdated
`@combine` | summarise values | ||
`groupby` | allows for group operations in the "split-apply-combine" concept | ||
|
||
DataFramesMeta.jl also provides `@rselect`, `@rsubset`, `@rorderby`, and `@rtransform` for operations which act row-wise. We will expore the distinction between column-wise and row-wise transformations later in this turorial. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe use term "whole-column" rather than "column-wise"? Alan Edelman was confused by "col-wise" (as it seems that one operation works vertically and the other horizontally which is not the case)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point. Hopefully the language is clearer.
docs/src/dplyr.md
Outdated
sleepData = @select msleep :name :sleep_total | ||
``` | ||
|
||
To select all the columns *except* a specific column, use the `Not` function for inverse selection. We preface the `Not` with `$` because it does not reference a column directly as a `Symbol`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the explanation of $
is not clear. The reader is not clear what would happen if you skipped $
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixing this. But we should merge a PR special-casing Not
, Between
, Regex
, and r"..."
so we don't have to worry about this.
log.txt
Outdated
@@ -0,0 +1,11 @@ | |||
Doctests: DataFramesMeta: Test Failed at /home/peterwd/.julia/packages/Documenter/oBZFM/src/Documenter.jl:870 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would not put this log file in the PR
Thanks for the review! Should be much improved now. |
@select msleep $varnames | ||
``` | ||
|
||
Similarly, to select the first column, use the syntax `$1`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is $
required here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes.
Right now, the parsing for selecting columns is exactly the same as working with anonymous functions. So since @transform df :y = :x .+ 1
would be ambiguous if we allowed 1
to be a column selector in the anonymous function, we need the same thing when doing select
.
Not ideal, though. We can change this before 1.0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it is :).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think $1
makes sense - it just probably should be well explained somewhere.
Fix bug disallowing `@rsubset(df, :a, :b, :c)`
Thanks! |
No description provided.